Pattern Matching in Huffman Encoded Texts
نویسندگان
چکیده
منابع مشابه
Processing Text Files as Is: Pattern Matching over Compressed Texts, Multi-byte Character Texts, and Semi-structured Texts
Techniques in processing text files “as is” are presented, in which given text files are processed without modification. The compressed pattern matching problem, first defined by Amir and Benson (1992), is a good example of the “as-is” principle. Another example is string matching over multi-byte character texts, which is a significant problem common to oriental languages such as Japanese, Kore...
متن کاملSpeeding Up String Pattern Matching by Text Compression: The Dawn of a New Era
This paper describes our recent studies on string pattern matching in compressed texts mainly from practical viewpoints. The aim is to speed up the string pattern matching task, in comparison with an ordinary search over the original texts. We have successfully developed (1) an AC type algorithm for searching in Huffman encoded files, and (2) a KMP type algorithm and (3) a BM type algorithm for...
متن کاملAdapting Boyer-Moore-Like Algorithms for Searching Huffman Encoded Texts
In this paper we propose an efficient approach to the compressed string matching problem on Huffman encoded texts, based on the Boyer-Moore strategy. Once a candidate valid shift has been located, a subsequent verification phase checks whether the shift is codeword aligned by taking advantage of the skeleton tree data structure. Our approach leads to algorithms that exhibit a sublinear behavior...
متن کاملMore Speed and More Compression: Accelerating Pattern Matching by Text Compression
This paper addresses the problem of speeding up string matching by text compression, and presents a compressed pattern matching (CPM) algorithm which finds a pattern within a text given as a collage system 〈D,S〉 such that variable sequence S is encoded by byte-oriented Huffman coding. The compression ratio is high compared with existing CPM algorithms addressing the problem, and the search time...
متن کاملProcessing of Huffman Compressed Texts with a Super-Alphabet
We present an efficient algorithm for scanning Huffman compressed texts. The algorithm parses the compressed text in O(n log2 σ b ) time, where n is the size of the compressed text in bytes, σ is the size of the alphabet, and b is a user specified parameter. The method uses a variable size super-alphabet, with an average size of O( b H log2 σ ) symbols, where H is the entropy of the text. Each ...
متن کامل